Disambiguating Compound Nouns for a Dynamic HPSG Treebank of Wall Street Journal Texts
نویسندگان
چکیده
The aim of this paper is twofold. We focus, on the one hand, on the task of dynamically annotating English compound nouns, and on the other hand we propose disambiguation methods and techniques which facilitate the annotation task. Both the aforementioned are part of a larger on-going effort which aims to create HPSG annotation for the texts from the Wall Street Journal (henceforward WSJ) sections of the Penn Treebank (henceforward PTB) with the help of a hand-written large-scale and wide-coverage grammar of English, the English Resource Grammar (henceforward ERG; Flickinger (2002)). As we show in this paper, such annotations are very rich linguistically, since apart from syntax they also incorporate semantics, which does not only ensure that the treebank is guaranteed to be a truly sharable, re-usable and multi-functional linguistic resource, but also calls for the necessity of a better disambiguation of the internal (syntactic) structure of larger units of words, such as compound nouns, since this has an impact on the representation of their meaning, which is of utmost interest if the linguistic annotation of a given corpus is to be further understood as the practice of adding interpretative linguistic information of the highest quality in order to give “added value” to the corpus.
منابع مشابه
Coordination Structure Analysis using Dual Decomposition
Coordination disambiguation remains a difficult sub-problem in parsing despite the frequency and importance of coordination structures. We propose a method for disambiguating coordination structures. In this method, dual decomposition is used as a framework to take advantage of both HPSG parsing and coordinate structure analysis with alignment-based local features. We evaluate the performance o...
متن کاملSyntactic Verifier as a Filter to Compound Unit Recognizer
This paper describes the combination compound unit (CU) recognizer with syntactic verifier using partial parsing mechanism. The recognizer finds all the CUs, combined concept including collocations, idioms, and compound nouns, in input sentence. CU information reduces the search space of syntactic analysis and a portion of Part-Of-Speech (POS) ambiguities. Syntactic verification is to obtain pr...
متن کاملA Pattern-Based Approach Uding Compound Unit Recognition and Its Hybridization with Rule-Based Translation
This paper describes a compound unit (CU) recognizer as a pattern-based approach and its hybridization with rule-based translation. A compound unit is a combined concept including collocations, idioms, and compound nouns. CU recognition reduces part of speech ambiguities by combining several words into a unit and consequently lessening the parsing load. It also provides pretranslated natural eq...
متن کاملAnnotating Wall Street Journal Texts Using a Hand-Crafted Deep Linguistic Grammar
This paper presents an on-going effort which aims to annotate the Wall Street Journal sections of the Penn Treebank with the help of a hand-written large-scale and wide-coverage grammar of English. In doing so, we are not only focusing on the various stages of the semi-automated annotation process we have adopted, but we are also showing that rich linguistic annotations, which can apart from sy...
متن کاملAnaphoric Annotation in the ARRAU Corpus
Arrau is a new corpus annotated for anaphoric relations, with information about agreement and explicit representation of multiple antecedents for ambiguous anaphoric expressions and discourse antecedents for expressions which refer to abstract entities such as events, actions and plans. The corpus contains texts from different genres: task-oriented dialogues from the Trains-91 and Trains-93 cor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010